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Next generation sequence data provides valuable information and tools for genetic and 
genomic research and offers new insights useful for marker development. This data 
is useful for the design of accurate and user-friendly molecular tools. Common bean 
{Phaseolus vulgaris L.) is a diverse crop in which separate domestication events happened 
in each gene pool followed by race and market class diversification that has resulted in 
different morphological characteristics in each commercial market class. This has led to 
essentially independent breeding programs within each market class which in turn has 
resulted in limited within market class sequence variation. Sequence data from selected 
genotypes of five bean market classes (pinto, black, navy, and light and dark red kidney) 
were used to develop InDel-based markers specific to each market class. Design of the 
InDel markers was conducted through a combination of assembly, alignment and primer 
design software using 1.6x to 5.1 x coverage of lllumina GAM sequence data for each of 
the selected genotypes. The procedure we developed for primer design is fast, accurate, 
less error prone, and higher throughput than when they are designed manually. All InDel 
markers are easy to run and score with no need for PGR optimization. A total of 2687 InDel 
markers distributed across the genome were developed. To highlight their usefulness, 
they were employed to construct a phylogenetic tree and a genetic map, showing that 
InDel markers are reliable, simple, and accurate. 
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INTRODUCTION 

Plant breeding embraces both art and science for crop improve- 
ment. Marker assisted selection (MAS) can boost the efficiency of 
breeding when markers linked to genes of interest are discovered 
(Yang et al, 2012). Marker development requires the comparison 
of genetic material of two or more genotypes to find the poly- 
morphic regions that segregate in a breeding population. Thus, it 
is important to have adequate genetic variation among the geno- 
types of interest to develop markers that can be used for MAS 
and other genetic studies. In fact, using MAS to improve a trait of 
interest in a self-pollinating species like common bean is becom- 
ing more challenging today in the United States because of the 
narrow genetic diversity of this species (McClean et al., 1993; 
Sonnante et al., 1994). 

Common bean is a diploid legume species with 11 chro- 
mosomes, a genome size of approximately 520 megabase pairs 
(Mbp), a few duplicated loci (Vallejos et al., 1992; Freyre et al., 
1998) and 49% transposable elements (Schlueter et al, 2008). 
A reference genome sequence of genotype G19833 was recently 
released in August (Schmutz et al, in press). Common bean has 
two distinct gene pools, Middle American (from northern Mexico 
to Colombia) and Andean (from southern Peru to northwestern 
Argentina). Each gene pool underwent separate domestication 



events (Gepts and Bliss, 1986; Koenig and Gepts, 1989; Khairallah 
et al, 1990, 1992; Koinange and Gepts, 1992; Freyre et al, 1996) 
followed by the creation of ecogeographic races in each of the two 
gene pools due to further selection under domestication (Singh 
et al, 1991; Beebe et al, 2000; Diaz and Blair, 2006; Mamidi et al, 
2011). Mamidi et al. (2011) estimated the duration and time of 
the single domestication event in each gene pool and suggested 
reciprocal migration between wild and landrace genotypes. There 
is a strong genetic differentiation between Middle American and 
Andean gene pools, and the Middle American gene pool is more 
diverse compared to the Andean gene pool (Mamidi et al., 2013). 
According to Singh et al. (1991), the Middle American gene pool 
with the center of domestication in Central and North America 
consist of three races, Durango, Jalisco, and Mesoamerican. The 
typical commercial market classes in the United States from this 
gene pool are pinto, great northern (GN), small red and pink 
beans from race Durango, and navy, small white and black beans 
from race Mesoamerican. The Andean gene pool with its center 
of domestication in South America includes three races: Nueva 
Granada, Peru, and Chile. The commercial market classes of this 
gene pool in the United States are light red kidney (LRK), dark 
red kidney (DRK), white kidney, and cranberry beans which are 
all from the Nueva Granada race (Mensack et al., 2010). 
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Breeding for commercial varieties in common bean usually 
occurs within each market class in order to retain their preferred 
seed size, shape, color, and pattern. The narrow genetic diver- 
sity within a market class is due to the small size of bottleneck 
populations (Gepts and Bliss, 1986), the rigid quality required 
by processors and consumers (Ghaderi et al., 1984; Wang et al., 
1988; Hosfield et al., 2000; Myers, 2000), the finite use of exotic 
germplasm (Silbernagel and Hannan, 1988, 1992; Miklas, 2000), 
and the restricted breeding strategies to meet consumer satisfac- 
tion regarding seed size, shape, and color (Singh, 1992). Although 
incorporation of exotic and unadapted germplasm is helpful in 
enhancement of genetic diversity, maintenance of the necessary 
phenotypic characteristics of each market class is challenging 
when using novel sources of variation due to linkage drag. Indeed, 
it has been documented in multiple plant species that quantitative 
traits are affected by genetic background (Tanksley and Hewitt, 
1988; Doebley et al, 1995; Lark et al, 1995; Cockerham and Zeng, 
1996; Li et al, 1997, 1998). This indicates the need for market 
class specific marker development to facilitate bean improvement 
by monitoring the variation that exists in each market class. 

Most of the currently available markers for common bean 
are Sequence Characterized Amplified Region (SCAR) markers 
developed from Random Amplification of Polymorphic DNA 
(RAPD) markers through a slow and difficult process (Kelly et al., 
2003). Other types of marker systems have been developed and 
used in different studies in common bean such as Inter Simple 
Sequence Repeats (ISSR) (Gonzalez et al., 2005), Simple Sequence 
Repeats (SSR) (Blair et al., 2003; Gomez et al., 2004; Buso et al, 
2006; Galeano et al., 2009; Cordoba et al, 2010). Recently a 
high-throughput Golden Gate SNP assay was released by Hyten 
et al. (2010). However,most of the markers are based on poly- 
morphism among a few genotypes from different market classes 
or even gene pools. Thus, the development and application of 
high throughput, user-friendly, market class-specific markers are 
indispensable. 

Insertion-deletions (InDel) are one of the common sources 
of variation that are distributed widely throughout the genome. 
Mechanisms such as transposable elements, slippage in simple 
sequence replication, and unequal crossover can result in the for- 
mation of InDels (Britten et al, 2003). They can be converted to 
user-friendly markers that can be distinguished easily based on 
their size (Vali et al, 2008) with minimum laboratory equipment. 
Many genetic studies in plants and animals have successfully uti- 
lized InDels (Hayashi et al, 2006; Vali et al, 2008; Vasemagi 
et al., 2010; Ollitrault et al, 2012). InDels and SNPs are now the 
most widely used marker systems in Arabidopsis because they are 
abundant, PCR-based, and informative due to their co-dominant 
nature (Pacurar et al, 2012). 

Next generation sequencing (NGS) provides inexpensive 
sequence data needed to develop genetic markers to be used in 
plant breeding and genetic studies. NGS technologies are efficient 
and offer new genomic information for minor crops for which 
a reference genome sequence is not available (Varshney et al, 
2009) and accelerates the development of genomic resources for 
crops with a reference genome. The objective of this study was to 
use lUumina sequence data from multiple genotypes within five 
bean market classes, which were selected from both the Andean 



and the Middle American gene pools, to develop user-friendly 
InDel markers that have wide applications for MAS and genomic 
studies. 

MATERIALS AND METHODS 
PLANT MATERIALS 

Three diverse genotypes from pinto, navy, black, and LRK and 
two genotypes from DRK market classes where sequenced with 
the lUumina Genome Analyzer (GAII). Genotypes were cho- 
sen based on their divergence in a neighbor joining (NJ) tree 
that was created for 192 genotypes from nine different market 
classes using 1159 high quality SNP markers (Hyten et al., 2010). 
The sequenced genotypes in each market class were as follows: 
Stampede, Buckskin, and Sierra from the pinto market class; C20, 
Michelite, and Laker from the navy market class; Cornell 49242, 
T-39, and UI 906 from the black market class, California Early, 
Lark and, Kardinal from the LRK market class, and Red Hawk 
and Fiero from the DRK market class. 

MARKER DEVELOPMENT 

The first step of marker design was a within genotype de novo 
assembly of the lUumina GAII DNA sequence data into con- 
tigs using Velvet 1.0 (Zerbino and Birney, 2008) with the default 
settings. BLAST-h (Camacho et al., 2009) was used to discover 
potential InDels. Using three genotypes within a market class, 
InDels were discovered by aligning contigs from one genotype 
as the query against a database consisting of the contigs of the 
two other genotypes. In addition, a pairwise blastn alignment 
was performed among all pairs of genotypes within a market 
class. An e-value cutoff of lE-50 and a maximum hit of one and 
two were used in BLAST to obtain the best hit for pair-wise and 
three-way alignments, respectively. InDels were discovered based 
on the size differences between the query and the database sub- 
ject. Several filters were applied to potential InDels: A minimum 
InDel size of 8 bp was used to ensure an appropriate resolution 
using agarose gel electrophoresis; unique InDel fragments were 
assured by blasting InDel fragments against the Phaseolus vulgaris 
L. scaffold assembly ARRA-V0.9. ARRA-V0.9 was an intermedi- 
ate scaffold assembly in the whole genome sequencing project. 
The e-value and maximum hit were set to lE-50 and 20, respec- 
tively. Queries with multiple hits were excluded to decrease the 
probability of designing markers from repetitive regions. Contigs 
that contained more than four consecutive Ns were excluded 
because this could lead to false InDel discovery or false InDel 
size. 

The contigs that contained InDels were aligned using Multalin 
(Corpet, 1988) to obtain the consensus region around the InDels 
for primer design. Primers were designed from the consensus 
sequence in BatchPrimer3 (You et al., 2008). The primer size 
parameters were set to 22, 26, and 32 bp as the minimum, opti- 
mum, and maximum size, respectively, and GC content was set 
to 35, 50, and 60% as the minimum, optimum, and maximum, 
respectively. Primer annealing temperature was set to a very nar- 
row range of 67, 68, and 69°C as the minimum, optimum, and 
maximum temperature, respectively. Finally, the maximum Tm 
difference between forward and reverse primers was set to 2°C 
only. The PGR product length was approximatlylO times the 
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InDel size to ensure the PGR products could be adequately sep- 
arated on a 3% agarose gel for efficient scoring. The product size 
varied between InDel size (bp) x 10 and [InDel size (bp) x 10] 
+ 10 (bp) for optimum and maximum values, respectively. The 
minimum product sizes were as listed in Table 1. These stringent 
criteria were intentional to avoid the need for PGR optimization 
for each primer set. All primer sets were optimized to amplify 
with a 55° G annealing temperature. 

NOMENCLATURE 

Including information on marker position on the reference 
genome in the marker names, provides the user with valu- 
able information on marker distribution. Thus, the markers 
were named in the format NDSU-IND-NN-XX.XXXX, where 
NDSU stands for North Dakota State University, IND for InDels 
and NN for the chromosome number. The Xs represent the 
physical position on the reference genome (G19833- Vl.O) in 
megabase pairs up to four decimals. For example InDel marker 
NDSU_IND_07_02.6485 is located on chromosome Pv07 at posi- 
tion 2.6485 Mbp in common bean Vl.O reference genome. 

PGR AMPLIFICATION 

The PGR protocol used to amplify all InDel markers was: 3 min at 
95° G for one cycle, 20 s at 95° G, 30 s at 55° G, and 1 min at 72° G 
for 45 cycles, and 10 min at 72°G for one cycle. Each PGR reac- 
tion consisted of a final concentration of 1 x PGR buffer including 
0.15 mM MgGb, 0.5 mM dNTP mk, 0.25 mM forward/reverse 
primers and 1 unit of Taq polymerase with a 20 [xl final volume. 

ALIGNMENT OF SEQUENCE DATA WITH THE REFERENCE GENOME 

The sequence data from 14 genotypes were mapped to the refer- 
ence genome (Vl.O) when the complete assembly became avail- 
able. Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2009) 
with default settings was used to map the reads with the refer- 
ence genome. SAMtools (Li et al, 2009) was used to convert the 
BWA output to a sorted bam file and obtain the mpileup file. 
The "pileup2indel" command with minimum coverage of five 
reads was used in VarScan (Koboldt et al, 2009) to find the num- 
ber of InDels for each genotype based on the reference genome 
(G19833). The VarScan output was also filtered based on the 
frequency of the variant allele by read count. Only InDels with 
variant allele frequency of 80% and higher were considered. 



Table 1 | InDel size and the corresponding minimum product size that 
was used by BatchPrimerS for primer design. 



InDel size (bp) 

8-9 

10-11 

12-14 

15-17 

18-22 

23-26 

27-29 

30-36 



Minimum product size (bp) 

70 
80 
90 
100 
110 
120 
130 
150 



MARKER PERFORMANCE AND APPLICATION 

To evaluate the performance of the designed markers, 219 
pinto markers were screened on Stampede, Sierra, Buckskin, 
and G19833. Moreover, six markers were tested on a few 
random genotypes from nine market classes (Table 2) to 
evaluate the performance of InDel markers in the market 
classes other than the one from which they were originally 
designed. 

To assess the InDel markers for applied genetic studies, 196 
markers were used to screen 24 diverse pinto genotypes. The 
24 pinto genotypes were as follow: Sierra, Aztec, Santa Fe, La 
Paz, Stampede, ND-307, Medicine Hat, Lariat, BelDakMe-RR- 
5, Sequoia, Remington, Durango, Max, PT7-2, Ouray, JM-126, 
Olathe, Hatton, Apache, UI-114, Nodak, Buckskin, Flint, and 
UI-196. These genotypes were chosen based on their diversity 
in a NJ tree which was constructed in GlustalX 2.1 (Larkin 
et al, 2007) using 1300 SNP markers (Hyten et al, 2010). 
The number of bootstraps used in GlustalX was 1000 and the 
genotypes were chosen from clusters that were diverse and had 
high bootstrap values in the tree. The 24 pinto genotypes were 
screened using the 196 markers and the markers that showed 
polymorphism were used to evaluate the performance of the 
InDel markers for distinguishing the 24 genotype and to con- 
struct a NI tree. PowerMarker version 3.25 (Liu and Muse, 
2005) was used to construct the NJ tree with 1000 bootstraps. 
The _Fst value was calculated in PowerMarker as well to eval- 
uate the overall genetic divergence among this collection of 
genotypes. 

In addition, an F2 population was used to evaluate the InDel 
markers for mapping purposes. The F2 population, NDZ- 11002 
was derived from a cross between Lariat x Medicine Hat and 
consisted of 87 F2 genotypes. Eighty two pinto markers that 
were polymorphic between the two parents were employed to 
screen the F2 population, and GarthaGene (De Givry et al., 2005) 
was used to build the genetic map. In GarthaGene, the "group" 
command was used with a distance and LD threshold of 20 cM 
and 3.00, respectively to group markers into linkage groups. The 
"Buildfw 2 2 {} 1" command was used to order the markers 
on each linkage group and to obtain the map with the highest 
likelihood value. Qgene (Nelson, 1997) was then used to visualize 
the map images. 

MULTIPLEXING 

Multiplexing of InDel markers was conducted using two and 
four markers in the same reaction mix. A total of one duplex 
and six fourplex sets were tested on 96 Middle American 
genotypes. The protocol for four markers in a 20 \l\ reaction 
mix was as follow: Ix PGR Buffer including 0.15 mM MgGl2, 
0.8 mM dNTP, 0.125 mM of each forward and reverse primer 
[total of 0.5 mM (0.125 x 4) forward and reverse primers] 
and 2 units of Taq polymerase. The protocol for multiplex- 
ing two markers was the same as above with an exception 
that 0.25 mM of each forward and reverse primer [total of 
0.5 mM (0.25 x 2)] was used. The PGR amplification cycle 
was the same as the cycle used to amply a single marker. 
The resulting amplification products were visualized on a 3% 
agarose gel. 
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Table 2 | Genotypes used to test the performance of six markers in 
other marl<et classes. 



Order* 


Genotype 


Market class 


1 


LJUI 1 1 II ID 


Black 


2 




Black 


3 


T-39 


Black 


A 


Prirncill AQ9A9 


DIdCK 




Q K a n 1 a 
Ol Id 1 lid 


Black 




OICIUIx IXlll^IlL 


Black 


7 


RplMiNph-RMR-T 


prpat nnrthprn 

V-J I^GL IIL/ILII^III 


g 


N/lattGrhnrn 

IVIa LLCl I ILtl 1 1 


Prpat nnrthorn 

UlCGL IIUlLIIClll 


g 


Tara 


Prpat nnrthprn 

\J 1 C?OL 1 Iv^l LI \ I 1 


10 


v> oy n G 


prpat nnrthprn 

VJ lOQL IIL^ILIIC^III 


11 


JM-24 


prpat nnrthprn 

V-J IC7GL IIL/ILII^III 


12 


fnPm i ni 

VJ 1 1 III II 


Prpat nnrthprn 

V-J lOGL IIL^ILII^III 


lO 


IVI lljl Icll Lc 


Navy 


1^ 


od 1 1 1 IdU 




1R 


oodldl cl 




1R 






17 


P9n 


Navy 


If? 
1 0 






1Q 


Pint' F /~iv /H 

riiiN nuyU 


Pink 


20 




Pink 


91 




Pink 


22 


Roza 


Pink 


zo 




Pink 


24 


PK91 5 


Pink 


25 


AP RpHhnnrl 

rA"^ 1 ICUUUI lU 


Qmall rpri 

Ol 1 IG II 1 


26 


AP FarlirpH 

AAv.^ 1 a 1 III kjKX 


Qmall rprI 

0 1 1 1 G 1 1 1 t^LJ 


27 


'"^annh iro 
oci|jjj| lilt; 


Q m a 1 1 rp H 

Ol 1 IG II 1 


28 




Qmall rpH 
Ol 1 IG II 1 cu 


9Q 


UI-3 


Ol 1 Id II 1 cU 


30 


NW-R? 

1 N V U DO 


Q m a 1 1 rp H 
Ol 1 IG II 1 


31 




Pinto 


32 


R 1 ipk^k"! n 

LJ UvjIxO M 1 1 


Pinto 


33 


r)i 1 r^nnn 

l_/ U 1 G 1 1 ^ w 


Pinto 




PT7-2 


Pinto 


35 




Pinto 


OD 


Md LLUI 1 


Pinto 


O/ 


Phinnnl.- 9000 
*^IIIIIUUI\ zuuu 


LiyilL 1 cU NlUllcy 


oo 


K-42 


LiyilL 1 cU NlUIIcy 


39 


Blush 


1 inht rpH t'iHnp\/ 
1— lyiiL 1 cU ixiui Icy 




VA-19 


LiyilL IcU NlUIIcy 


41 


Montcalm 


Dark red kidney 


42 


USDK-CBB-15 


Dark red kidney 


43 


Fiero 


Dark red kidney 


44 


CDRK 


Dark red kidney 


45 


Benton 


Snap bean 


46 


91-G 


Snap bean 


47 


Harvester 


Snap bean 


48 


Cantar 


Snap bean 



Six random genotypes were selected from pinto, great northern, navy, black, 
pink, and small red market classes and four random genotypes were selected 
from dark and light red kidney and snap bean marker classes. 
*lndicates the order of the genotypes from left to right on the agarose gel in 
Figure 3. 



RESULTS 

ILLUMINA SEQUENCING, DE NOVO ASSEMBLY AND PRIMER DESIGN 

The DNA sequence data consisted of 19.6 billion bases in the 
form of 114 bp paired-end reads from 250 to 400 bp size selected 
fragments of 14 genotypes. The 114 bp paired-end reads did not 
include lUumina adaptor sequences. The sequencing coverage 
ranged from 1.6x to S.lx with an average of 3.7 x. Laker and 
Kardinal had the lowest and highest number of raw GAII reads, 
3,711,450 and 11,748,671 reads, respectively. Cornell 49242 with 
261,313 and Stampede with 752,015 contigs had the smallest and 
largest number of assembled contigs, respectively. Only contigs 
120 bp or greater were used for the assembly statistics. The mean 
contig length across all 14 genotypes was 322 bp. The N50 var- 
ied from 222 to 651 bp, with an average N50 value of 394 bp. The 
GC content ranged from 32.5 to 40.9% with an average of 34.4%. 
The minimum GC value of 35% was used to design primers in 
BatchPrimer3. The lUumina reads and assembly information are 
summarized in Table 3. 

Alignment of the reads with the VI. 0 reference genome for 
14 genotypes indicated a range of 2330 to 45,770 InDels of 1 
bp or greater across the genome. The minimum and maximum 
number of InDels belonged to the Laker and Buckskin genotypes, 
respectively (Table 4). 

The number of aligned contigs (BLAST output) within a mar- 
ket class varied between 296,852 and 859,350 in the three-way 
alignments. The aligned contigs were filtered based on the InDel 
size, and those with a tentative size of 8 bp or greater were 
retained. This significantly reduced the number of sequences to 
analyze in the next step. For example, the number of contigs 
in the DRK market class dropped from 296,852 to 1378, and 
the reduction was from 859,350 to 9634 for pinto market class. 
Filtering for uniqueness of hits in the common bean reference 
scaffolds reduced the range of contigs from 450 in DRK to 6010 
in pinto, with an average value of 2114 contigs across all market 
classes. The total number of consensus sequences submitted to 
BatchPrimer3 across all alignments (three-way and all pair-wise 
alignments) varied from 11,406 in the pinto to 324 in the DRK 
market class when fragments containing four consecutive Ns in 
their sequence were removed (Table 5). The basic properties of 
11,406 pinto contigs submitted to BatchPrimer3 are summarized 
in Table 6. 

The final number of primer pairs ranged from 1343 in the 
pinto market class (highest) to 60 in the DRK market class 
(lowest) (Tables). There were a total of 2687 primer pairs 
designed across all market classes (Supplementary Table), all hav- 
ing single hits on the common bean reference genome VI. 0 
and only six homologs to common bean transposable elements 
(Scott Jackson, personal communication). The distribution of 
InDel sizes in each market class is illustrated in Figure 1. 
The average distribution of InDel markers varied from one 
perl32Kb on chromosome Pv05 to one per 314 Kb on chromo- 
some Pv03 with an average of one InDel marker every 200 Kb 
across the genome. Although the euchromatic region forms less 
than half of the bean genome (44.1%), the marker density 
was higher in this region (65.8%) (Figure 2). Markers located 
in the pericentromeric region are highlighted in orange color 
in the Supplementary Table. The number of markers on the 
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Table 3 | lllumina paired-end reads information and contig information after the de novo assembly. 







lllumina GAM reads 






Assembly 






Genotype 


Market 


Number of reads 


Genome 


Total length of 


Number of 


N50 (bp) 


GC% 




class 


(In millions) 


coverage (x)* 


assembled contigs (Mbp)"*" 


assembled contigs 






Cornell 4yz4z 


Black 


7.20 


3.2 


66.97 


zbl ,3\3 


300 


40.9 


T-39 


Black 


8.42 


3.7 


178.16 


527,965 


422 


32.6 


Ul 906 


Black 


6.44 


2.8 


107.99 


A r\r\ en 

4UU,obz 


320 


33.3 


hied Hawk 


DRK 


7.20 


3.2 


128.33 


4z/,zdU 


360 


35.0 


Fiero 


UnN 


7.61 


3.3 


146.58 


4/U,byo 


373 


34.0 


California Early 


Lnl\ 


"7 CO 


3.3 


145.26 


AC'] OO 1 

4b 1 ,oo 1 


387 


34.2 


Lark 




10.30 


4.5 


199.12 


531 ,149 


501 


33.8 


Kardinal 


LRK 


11.74 


5.1 


238.71 


534,298 


651 


32.5 


C20 


Navy 


7.84 


3.4 


100.52 


355,778 


338 


37.6 


Michelite 


Navy 


9.47 


4.1 


170.02 


504,441 


426 


34.3 


Laker 


Navy 


3.71 


1.6 


81.10 


381,399 


222 


33.8 


Stampede 


Pinto 


10.20 


4.5 


214.71 


752,015 


330 


33.2 


Sierra 


Pinto 


9.58 


4.2 


178.36 


532,937 


409 


32.7 


Buckskin 


Pinto 


10.45 


4.6 


190.93 


529,875 


477 


33.7 



'The genome coverage was calculated based on a 521 Mbp genome size and 114 bp paired-end lllumina reads. 
+ Contigs 120 bp or greater were used for the assembly statistics. 



Table 4 | Number and distribution of InDels In each genotype when aligned with G19833. 



Genotype 


Market 
class 


Genome 
coverage (x) 


Number of InDels in 
alignment with G19833 


InDel frequency 
per Mbp 


Number of InDels 
In 1 X coverage 


Cornell 49242 


Black 


3.2 


9226 


17.7 


2901 


T-39 


Black 


3.7 


33,662 


64.6 


9049 


Ul 906 


Black 


2.8 


13,681 


26.3 


4817 


Red Hawk 


DRK 


3.2 


6899 


13.2 


2169 


Fiero 


DRK 


3.3 


8961 


17.2 


2667 


California Early 


LRK 


3.3 


7017 


13.5 


2114 


Lark 


LRK 


4.5 


17056 


32.7 


3749 


Kardinal 


LRK 


5.1 


27122 


52.1 


5226 


C20 


Navy 


3.4 


11,205 


21.5 


3238 


Michelite 


Navy 


4.1 


31,455 


60.4 


7525 


Laker 


Navy 


1.6 


2330 


4.5 


1421 


Stampede 


Pinto 


4.5 


37902 


72.8 


8404 


Sierra 


Pinto 


4.2 


32,199 


61.8 


7612 


Buckskin 


Pinto 


4.6 


45,770 


87.9 


9907 



The numbers are based on InDel size of 1 bp and greater, and 521 Mbp genome size. VarScan was used to discover the InDel polymorphisms. 



Table 5 | Filtering criteria for contigs used for primer design. 



Market classes 


Number of contigs with InDels in three-way alignment 


Numbers across all alignments 


Blast hits 


InDel size >8 


Unique scaffold hit/lnDel >8 


Contigs submitted to Batch primer3 


Number of primer pairs 


Pinto 


859,350 


9634 


6010 


11,406 


1343 


Black 


342,085 


1955 


708 


1913 


292 


Navy 


507147 


1515 


650 


1867 


323 


LRK 


834,726 


5524 


2754 


5456 


669 


DRK 


296,852 


1378 


450 


324 


60 




(pair-wise) 











The numbers are provided only for three-way alignment in all market classes except the dark red kidney market class. 
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chromosomes varied from 144 to 333 on chromosomes PvlO 
and Pv02, respectively, with an average value of 238 markers per 
chromosome. 

MARKER PERFORMANCE 

To evaluate the performance of the InDel markers, a total of 219 
markers from the pinto market class were tested with Stampede, 
Sierra, Buckskin, and G19833. A total of 196 markers showed 
polymorphism (89.5%), and only 23 markers (10.5%) were either 
monomorphic or difficult to score among four genotypes. Six 



Table 6 | Pre-analysis of 11,406 pinto contigs submitted to 
BatchPrimerS. 



Item 


Mean 


Std. 
deviation 


Min 


IVlax 


Coe. of 
variation (%) 


Sequence 


390.49 


174.45 


104 


2680 


44.67 


length (bp) 












GC contents 


29.04 


5.92 


4.45 


53.21 


20.38 


(%) 













markers from four market classes were tested on 48 random geno- 
types from nine different market classes as well. Although the 
primers were originally designed for a specific market class, we 
observed polymorphism among the genotypes from other mar- 
ket classes. Based on a sample of genotypes, markers from the 
Middle American gene pool did not show polymorphism in light 
and DRK market classes and the marker from the LRK mar- 
ket class showed polymorphism only among Andean genotypes 
(Figure 3). 

MARKER APPLICATION 

Phylogenetic analysis 

A set of 196 InDel markers were used to screen 24 diverse pinto 
genotypes (Table 7) and 172 (87.7%) were polymorphic and used 
in a phylogenetic analysis. The NJ tree and the fst value indicated 
two distinct clusters among the 24 pinto genotypes (Figure 4). 

Genetic map 

Eighty two polymorphic InDel markers where used to construct 
a genetic map. A total of nine linkage groups that correspond 
to nine chromosomes (all chromosomes except one and four) 
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FIGURE 1 I Distribution of 2687 InDle sizes in five marl<et classes. 



Frontiers in Plant Science | Plant Genetics and Genomics 



May 2014 | Volume 5 | Article 185 | 6 



Moghaddam et al. 



InDel marker development in common bean 



60- 
40- 
20- 
0- 

60- 

40- 

20- 



c 

o 
O 



60- 
40 
20- 
0- 

60- 
40- 
20- 



chr= 1 



— rr r 



chr = 4 



chr = 7 



"Ln— 



chr= 10 



I I I 



chr = 2 




iJ 



chr = 5 




chr = 8 



chr = 11 



I I 



chr = 3 



chr = 6 



chr = 9 



^ i I I I I I I I I 



0 10 20 30 40 50 60 0 10 20 30 40 50 60 0 10 20 30 40 50 60 

position 

FIGURE 2 I Physical distribution of 2687 InDel marl<ers across 11 chromosomes of common bean. The x axis sinews the chromosome length in Mbp and 
the y axis represents the frequency of InDel markers. The green rectangle indicates the pericentromeric region in each chromosome. 



were built from the F2 population with 87 genotypes. Five pairs 
of markers co-segregated in CarthaGene analysis. Among the 77 
remaining markers, 18 were excluded from the genetic map when 
the "Buildfw" function of CarthaGene with a LOD threshold of 
2.2 was used. The genetic and physical order was consistent for 54 
of the 59 marker loci (Figure 5). 

Multiplexing 

Multiplexing of tested InDel markers showed clear and scorable 
bands on the 3% agarose gel when one set of duplex and six sets of 
fourplex were used. Figure 6 illustrates the results of two fourplex 



sets on 48 Middle American genotypes on a 3% agarose gel as an 
example. The marker information is provided in Table 8. 

DISCUSSION 

Common bean is a diverse crop species with much variation in 
the seed color, shape, and many other phenotypic characteris- 
tics. The species includes wild types, landraces which are the 
domesticated forms, ecogeographical races which are the result 
of selection, and market classes within each of the ecogeographi- 
cal races. Plant breeding is generally restricted to market classes to 
retain the specific characteristics of the market class. However, as 
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Black GN Navy Pink Small red Pinto LRK DRK Snap 



200 bp 
100 bp 




200 bp 
100 bp 



FIGURE 3 1 Six InDel markers tested on random genotypes from 
nine different market classes. The names of the genotypes, 

from left to right, are listed in Table 2. (A) Marker 
NDSU_IND_07_02.6485 from pinto market class. (B) Marker 
NDSU_IND_10_42.1355 from pinto market class. (C) Marker 



NDSU_IND_06_12.3324 from light red kidney market class. (D) 
Marker NDSU_IND_05_01.7405 from pinto market class. (E) Marker 
NDSU_IND_08_36.2119 from navy market class. (F) Marker 
NDSU_IND_09_07.6278 from black market class. First lane from 
right in all panels is the DNA Ladder. 



Table 7 | Specifications of 24 pinto genotypes. 



Genotype 


Growth habit 


Application date 


Source 


PT7-2 


II to III 


Not released 


USDA-ARS- 








Washington 


Sequoia 


lib 


NA 


ISB 


Max 


III 


NA 


ISB 


Santa Fe 


Ha 


2010 


MSU 


Stampede 


lla 


2008 


NDSU 


La Paz 


lib 


2008 


ProVita.lnc. 


ND-307 


lib 


2008 


NDSU 


Lariat 


lib 


2008 


NDSU 


Medicine Hat 


lla 


2007 


Seminis 


Durango 


lib 


2007 


ProVita.lnc. 


Remington 


lib 


1996 


RogersSeedCo 


Hatton 


Ilia 


1996 


NDSU 


Buckskin 


lllb 


1995 


Novartis Seed 








Inc. 


Apache 


Ilia 


1995 


ISB 


Aztec 


lib 


1993 


MSU 


BelDakMi-RR-5 


II 


1993 


USDA-ARS- 








Beltsville-MD 


Sierra 


lib 


1990 


MSU 


UI-196 


lllb 


1990 


Ul 


Flint 


II to lllb 


1989 


RogersSeedCo 


JM-126 


III to Ilia 


1986 


USDA- 








ARS/WSU 


Nodak 


III 


1985 


USDA- 








ARS/NDSU 


Olathe 


III 


1980 


CSU 


Ouray 


III to Ilia 


1975 


CSU 


UI-114 


III 


1967 


Ul 



NA, Information not available; ISB, Idaho Seed Bean Company; MSU, Michigan 
State University; NDSU, North Dakota State University; Ul, University of Idaho; 
WSU, Washington State University; CSU, Colorado State University. 



indicated by the high polymorphism rate (87.7%) based upon our 
analysis of 24 genotypes of the pinto market class, InDel mark- 
ers appear to be polymorphic even within a market class. InDel 
markers are easy to use co-dominant markers and are present 
throughout the genome. With the availability of abundant next 
generation sequence data, identification of InDels has become a 
simple process. 

MARKER DEVELOPMENT 

We selected diverse genotypes in each market class based on the 
most comprehensive SNP dataset available. The lUumina GA II 
was used to generate paired end reads. The lUumina technology 
results in short reads but high coverage (Vera et al., 2008) as well 
as high quality data where 70% of base calls in 2 x 75 bp paired- 
end sequences have a quality score of Q30 or higher. The standard 
paired-end libraries of lUumina with a length between 200 and 
500 bp can provide a platform to identify large and small InDels, 
inversions and other rearrangements. Paired-end reads boost the 
robustness of de novo assembly, SNP identification, and InDel 
discovery. 

In this study we developed a genome wide collection of 2687 
InDel markers that can be amplified without any PGR optimiza- 
tion and with minimum lab equipment. One of the filtering 
criteria that dramatically decreased the number of markers was 
the InDel size. There was a 215-fold decrease in the number of 
potential InDels in DRK when the contigs from the BLAST out- 
put were filtered for a minimum InDel size of 8 bp, and this 
reduction was about 89-fold for the pinto market class in the 
three-way alignment. However, filtering for uniqueness of hits 
to the common bean reference scaffolds (ARRA-V0.9) did not 
cause a dramatic reduction. For example, the number of con- 
tigs in the DRK market class dropped to one third, and this 
reduction was only 1.6-fold for the pinto market class in the 
three-way alignment. Generally there were less InDels in the 
DRK market class possibly due to the presence of only two 
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FIGURE 4 1 Neighbor joining tree of 24 pinto genotypes that 
cluster into two distinct groups (i) newer varieties with type II 
growth habit and (ii) older varieties with type III growth habit. 



sequenced genotypes in this market class. Moreover, Andean 
types are reported to have narrower genetic diversity compared 
to the Mesoamerican genotypes (Beebe et al, 2001). The strin- 
gent primer design criteria also resulted in another huge drop 
in the number of primer pairs that were selected. These strin- 
gent criteria were necessary because the development of markers 
should be precise and cost effective with proper throughput 
(Jander et al, 2002). 

Several factors affect the discovery of functional InDel mark- 
ers. As observed in Arabidopsis, decreasing InDel size from 25 to 
6 bp increased the number of markers from 277 to 1073 (Salathia 
et al., 2007; Hou et al, 2010). The phylogenetic relationship 
between the genotypes used for InDel discovery is important. 
Kardinal, an Andean genotype, is more closely related to the 
reference genome (G19833), another Andean genotype, than 
Buckskin, a Middle American genotype. Less Kardinal InDels 
were observed than Buckskin InDels (27,122 vs. 45,770) even 
thought it had greater read coverage (5.1 x vs. 4.6x). This trend 
was observed for all genotypes: more InDels were discovered 
among Middle American genotypes than the Andean genotypes 
because the reference genome is of Andean origin (Table 4). 

In total we discovered 2687 InDel markers for an average of 
one per 200 Kb. The fact that they are preferentially distributed in 
the highly recombinogenic region of the genome increases their 
utility for multiple genetic analyses. 



The Fst value of 0.16 indicates the degree of variation between the 
two groups. Bootstrap values greater than 50% are shown on the 
nodes. 



MARKER APPLICATION 

In our study, 87.7% of the 196 markers that were used in 
the phylogenetic analysis of 24 pinto genotypes were poly- 
morphic. In the NJ tree, the pintos were separated based on 
plant architecture and the application/release date of the vari- 
ety. Indeed, newer, upright pintos clearly clustered separately 
from older, prostrate pintos with fixation index (_Fst) of 0.16 
which indicates a great degree of genetic divergence among sub- 
populations (Hartl and Clark, 1997). This might be due to 
selection for growth habit in bean breeding programs where 
the newer breeding programs prefer the upright beans since 
this trait offers several advantages such as ease of manage- 
ment, higher grain yield, and reduced disease issues (Cunha 
et al., 2005). InDel markers have been used for phylogenetic 
studies. Steele et al. (2008) used InDel polymorphisms in rice 
to separate Basmati genotypes from other genotypes. OUitrault 
et al. (2012) showed that citrus diversity and phylogenetics 
based on InDel data are consistent with those based on SSR 
markers. 

The observation that InDel markers developed from one mar- 
ket class showed polymorphism in the other market classes 
indicates their broad utility. This denotes that although these 
InDel markers were designed to capture the variation within each 
market class, their performance and application can be expanded 
to the entire bean germplasm. 
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FIGURE 5 I Correspondence between genetic and physical positions. The pink bars are linkage groups and the blue bars are the chromosomes with the 
physical positions of the InDel markers on their right side. The sizes of the chromosomes are proportional to their actual size. 
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FIGURE 6 I Multiplexing of markers on 48 Middle American bean 
genotypes showed distinct bands on the 3% agarose gel 
electrophoresis. (A) Amplification products using InDel markers 
NDSU_IND_05_372272, NDSU_IND_06_16.5002, NDSU_INDJ1_30.9655, 
and NDSU_IND_06_31.8021 on 48 bean genotypes. All four markers showed 



polymorphism. (B) Amplification products using InDel markers 
NDSU_IND_11_33.0572, NDSU_IND_07_42.1709, NDSU_IND_07_25.192B, 
and NDSU_IND_10_19.1957 on the same 48 bean genotypes. Marker 
NDSU_IND_07_25.1928 was monomorphic and the other three were 
polymorphic. The first lane from the right in (A,B) are the DNA Ladders. 



Table 8 | Specifications of InDel markers used for multiplexing (two 
sets of fourplex). 



Marker 


Market 


Max-product 


Min-product 


InDel 


ID 


class 


size (bp) 


size (bp) 


size (bp) 


NDSU_IND_05_372272* 


Pinto 


78 


70 


8 


NDSU_IND_06_16.5002* 


Black 


141 


124 


17 


NDSU_IND_11_30.9655* 


Pinto 


177 


160 


17 


NDSU_IND_06_31.8021* 


Pinto 


220 


198 


22 


NDSU_IND_11_33.0572+ 


Pinto 


76 


67 


9 


NDSU_IND_07_42.1709+ 


Pinto 


114 


99 


15 


NDSU_IND_07_25.1928+ 


Black 


153 IH 






NDSU_IND_10_19.1957+ 


Pinto 


197 


175 


22 



Marker shaded in gray was monomorphic. 

'One fourpiex marl<er set mixed in one PCR reaction. 

+ The second fourplex marker set mixed in one PCR reaction. 



The InDel markers should be useful for genetic map con- 
struction because there are on average about 200 markers on 
each chromosome. We used a relatively small mapping popula- 
tion to illustrate their application for linkage analysis. Generally, 
recombination occurs more frequently in regions distal to the 
centromeric region (Curtis and Lukaszewski, 1991; Tanksley et al., 
1992; Werner et al., 1992). Because of low marker density and a 
small number of recombination events, our map did not cover the 
centromeric blocks in the F2 mapping study, resulting in map- 
ping only a portion of the chromosome or of two clusters of 
markers, one from each arm. There were five discrepancies in our 
genetic map relative to the physical map. Marker order differences 
between the genetic and physical map or genetic maps from dif- 
ferent populations or marker systems has been observed in other 
studies as well (Snelling et al, 2007; Wei et al, 2007; Xia et al, 
2007). These differences could be a result of sequence assembly 
errors, inversions, and segregation distortion. 

The possibility of conducting multiplex PCR is another indi- 
cator of the broad utility of InDel markers. With multiplex- 
ing, genotyping is even more cost effective due to reduced 



amount of reagents and DNA quantity needed for PCR ampli- 
fication. Moreover, this method saves time when hundreds of 
markers are screened and broadens the coverage when DNA 
availability is limited (Edwards and Gibbs, 1994; Karaiskou and 
Primmer, 2008). Our InDel markers meet many of the crite- 
ria that Henegariu et al. (1997) mentioned as critical parame- 
ters in a multiplex PCR. According to their study, some of the 
basic principles include the appropriate primer length which 
should be 18-34 bp or higher and all of our primers are 
designed with a length of 26-32 bp. Henegariu et al. (1997) 
also reported that by increasing the primer length up to 28- 
30 bp, the annealing temperature could be increased result- 
ing in a reduction of non-specific PCR products. GC content 
of 35-60% and annealing temperature between 55 and 58°C 
are other basic principles of multiplex PCR. The primers we 
designed have a minimum GC content of 35%, and all amplify 
at 55°C. Henegariu et al. (1997) indicated 54°C as the opti- 
mum temperature for co-amplification of loci in the multiplex 
PCR. Although the probability of non-specific product amplifi- 
cation increases at this temperature, simultaneous amplification 
of many specific loci greatly suppresses the yield of non-specific 
amplification products due to limited enzyme and nucleotide 
resources. 

In conclusion, this study shows the usefulness of DNA 
sequence data as the raw material for primer development in 
the presence or absence of a reference genome. We show that 
contigs obtained from the de novo assembly of sequence data 
are sufficient for polymorphism discovery. However, without 
a completely assembled reference genome or a set of primary 
scaffolds, contigs cannot be filtered to eliminate the duplicate 
loci or transposable elements. The reference sequence could 
reduce the development of redundant markers and allow the 
determination of the exact physical position and order of the 
markers. The availability of high density markers affects the suc- 
cess of genetic map construction, map-based cloning (Pacurar 
et al., 2012) and diversity studies. The availability of 2687 InDel 
primers will enhance MAS and diversity studies in common 
bean. 
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